Home Projects Agentic Browser API Server Git Hub Integration API

Git Hub Integration API

Referenced Files

api/main.py routers/github.py services/github_service.py models/requests/github.py models/response/gihub.py prompts/github.py tools/github_crawler/convertor.py core/config.py

Table of Contents#

Introduction
Project Structure
Core Components
Architecture Overview
Detailed Component Analysis
Dependency Analysis
Performance Considerations
Troubleshooting Guide
Conclusion
Appendices

Introduction#

This document describes the GitHub integration API that enables repository analysis and contextual Q&A powered by a large language model. It supports:

Repository ingestion via a normalized GitHub URL
Context-aware question answering using repository summary, file tree, and content
Optional file attachment processing via a cloud generative AI SDK
Chat history integration for conversational context
Robust error handling and user-friendly messages for common failure modes

The API exposes a single endpoint that accepts a GitHub repository URL and a question, returning a Markdown-formatted answer derived from the repository context.

Project Structure#

The GitHub integration spans several modules:

API router: defines the endpoint and request/response models
Service layer: orchestrates ingestion, optional file attachment processing, and LLM invocation
Prompt pipeline: constructs a structured prompt with repository context and guidelines
Tooling: converts a GitHub repository into a unified markdown-like context
Configuration: loads environment variables and logging

graph TB Client["Client"] --> API["FastAPI App
/api/genai/github"] API --> Router["GitHub Router
POST /"] Router --> Service["GitHubService"] Service --> Tool["GitHub Crawler
convert_github_repo_to_markdown"] Service --> Prompt["Prompt Chain
get_chain()"] Service --> LLM["LargeLanguageModel"] Service --> Config["Environment Config"] Tool --> Gitingest["gitingest
ingest_async/ingest"]

Diagram sources

Section sources

Core Components#

Endpoint: POST /api/genai/github
Request body: GitHubRequest (URL, question, optional chat history, optional attached file path)
Response body: GitHubResponse (content)
Authentication: Not enforced by the endpoint; clients should secure access as appropriate for their deployment
Rate limiting: Not implemented in the endpoint; consider upstream rate limiting and retries

Section sources

Architecture Overview#

The request lifecycle:

Client sends a POST request with a GitHub URL and question
Router validates presence of required fields and delegates to the service
Service normalizes the URL, ingests repository content, optionally attaches a file, and invokes the prompt chain
The prompt chain builds a structured prompt and queries the LLM
The response is returned as a Markdown-formatted string

sequenceDiagram participant Client as "Client" participant API as "FastAPI App" participant Router as "GitHub Router" participant Service as "GitHubService" participant Tool as "convert_github_repo_to_markdown" participant Prompt as "get_chain()" participant LLM as "LargeLanguageModel" Client->>API : "POST /api/genai/github" API->>Router : "Dispatch request" Router->>Service : "generate_answer(url, question, chat_history, attached_file_path)" Service->>Tool : "convert_github_repo_to_markdown(url)" Tool-->>Service : "InjestedContent(summary, tree, content)" Service->>Prompt : "invoke(chain)" Prompt->>LLM : "generate_content(prompt)" LLM-->>Prompt : "response" Prompt-->>Service : "Markdown answer" Service-->>Router : "content" Router-->>Client : "{ content }"

Diagram sources

Detailed Component Analysis#

Endpoint Definition#

Method: POST
Path: /api/genai/github
Request JSON schema:
- url: string (HTTP URL; must resolve to a GitHub repository)
- question: string (required)
- chat_history: array of objects (optional)
- attached_file_path: string (optional; absolute path to a local file)
Response JSON schema:
- content: string (Markdown-formatted answer)

Behavior highlights:

Validates presence of question and url
Returns structured error messages for invalid inputs or ingestion failures
Supports optional file attachment processing via a cloud generative AI SDK

Section sources

Service Layer#

Responsibilities:

Normalize GitHub URL to repository root
Ingest repository content (summary, tree, content)
Optionally attach a file and query a cloud generative AI SDK
Build and execute the prompt chain with repository context
Return user-friendly error messages for common failure modes

Key logic:

URL normalization strips non-repository path segments (e.g., commits, issues, pulls, tree, blob)
Repository content is truncated to a maximum character limit to fit within LLM context windows
Error handling differentiates between invalid URLs, inaccessible repositories, and token limit exceeded scenarios

flowchart TD Start(["Entry: generate_answer"]) --> Normalize["Normalize GitHub URL"] Normalize --> Ingest["Ingest repository content"] Ingest --> IngestOK{"Ingestion OK?"} IngestOK --> |No| HandleError["Return user-friendly error"] IngestOK --> |Yes| AttachFile{"Attached file?"} AttachFile --> |Yes| UploadFile["Upload file via cloud SDK"] UploadFile --> BuildPrompt["Build prompt with repo context + chat history"] BuildPrompt --> CallLLM["Call cloud SDK generate_content"] CallLLM --> ReturnAnswer["Return response text"] AttachFile --> |No| BuildPrompt2["Build prompt with repo context + chat history"] BuildPrompt2 --> InvokeChain["Invoke LangChain chain"] InvokeChain --> ReturnAnswer

Diagram sources

Section sources

services/github_service.py

Prompt Pipeline#

The prompt pipeline composes:

A system message instructing the model to answer solely from repository context
Repository summary, file tree, and relevant content
Optional chat history
A user question
Formatting guidelines for Markdown responses

The pipeline uses a LangChain chain with a prompt template and an LLM client, returning a string response.

Section sources

GitHub Crawler Tool#

The crawler:

Normalizes GitHub URLs to repository root
Uses asynchronous ingestion when available, otherwise falls back to synchronous ingestion
Truncates repository content to a maximum length to fit within LLM context windows
Returns a structured content object containing summary, tree, and content

classDiagram class InjestedContent { +string tree +string summary +string content } class GitHubCrawler { +_normalize_github_url(url) string +convert_github_repo_to_markdown(repo_link) InjestedContent } GitHubCrawler --> InjestedContent : "returns"

Diagram sources

Section sources

tools/github_crawler/convertor.py

Configuration and Environment#

Google API key is loaded from environment variables for optional file attachment processing
Logging level is configurable via environment variables

Section sources

Dependency Analysis#

The GitHub integration depends on:

FastAPI router for endpoint definition
Pydantic models for request/response validation
LangChain for prompt composition and LLM invocation
gitingest for repository ingestion
Optional cloud generative AI SDK for file attachment processing

graph LR Router["routers/github.py"] --> Service["services/github_service.py"] Service --> Tool["tools/github_crawler/convertor.py"] Service --> Prompt["prompts/github.py"] Service --> Config["core/config.py"] Prompt --> LLM["LargeLanguageModel"] Tool --> Gitingest["gitingest"]

Diagram sources

Section sources

Performance Considerations#

Repository content truncation: The crawler enforces a maximum content length to prevent exceeding LLM context windows. This preserves the tree and summary for navigation and structure while limiting large file content.
Asynchronous ingestion: When available, asynchronous ingestion reduces latency in the request path.
Optional file attachments: Uploading and processing attached files adds overhead; use judiciously and ensure the file size remains within SDK limits.

[No sources needed since this section provides general guidance]

Troubleshooting Guide#

Common issues and resolutions:

Invalid or non-repository URL:
- Symptom: Error indicating the URL does not point to a valid repository root
- Resolution: Navigate to the main repository page (e.g., github.com/owner/repo)
Repository access errors:
- Symptom: Could not access the repository; ensure the URL is correct and the repository is public
- Resolution: Verify repository visibility and URL correctness
Repository too large:
- Symptom: Token limit exceeded even after truncation
- Resolution: Ask about a specific file or directory instead of the entire repository
Attached file processing errors:
- Symptom: Failure to process the attached file
- Resolution: Confirm the file path exists and the environment has a valid Google API key configured

Section sources

Conclusion#

The GitHub integration API provides a streamlined pathway to analyze repositories and answer contextual questions. By normalizing URLs, truncating content, and leveraging a structured prompt pipeline, it delivers reliable, Markdown-formatted answers. Optional file attachment support extends capabilities for multimodal workflows. For production deployments, consider adding authentication, rate limiting, and observability around ingestion and LLM calls.

[No sources needed since this section summarizes without analyzing specific files]

Appendices#

API Reference#

Base URL: /api/genai/github
Method: POST
Path: /
Headers:
- Content-Type: application/json
Request body schema:
- url: string (HTTP URL; must resolve to a GitHub repository)
- question: string (required)
- chat_history: array of objects (optional)
- attached_file_path: string (optional; absolute path to a local file)
Response body schema:
- content: string (Markdown-formatted answer)

Example request payload:

url: “https://github.com/example/repo”
question: “Explain the main entry point”
chat_history: [] or [{“role”: “user”, “content”: “…”}, …]
attached_file_path: null or “/absolute/path/to/file”

Example response payload:

content: “Markdown-formatted answer…”

Authentication:

Not enforced by the endpoint; secure access according to your deployment needs

Rate limiting:

Not implemented in the endpoint; implement upstream controls as needed

Webhook integration:

Not implemented in this API; integrate external webhooks at the application boundary if required

Repository context extraction:

The service normalizes the URL to the repository root and truncates content to fit within LLM context windows

Section sources

Previous File Upload API

Next Gmail Integration API

Agentic Browser

AI Agent System

API Server

Browser Automation

Browser Extension

Data Models And Schemas

Prompts And Prompt Engineering

Service Integrations

System Architecture

Tool System

Git Hub Integration API

Table of Contents#

Introduction#

Project Structure#

Core Components#

Architecture Overview#

Detailed Component Analysis#

Endpoint Definition#

Service Layer#

Prompt Pipeline#

GitHub Crawler Tool#

Configuration and Environment#

Dependency Analysis#

Performance Considerations#

Troubleshooting Guide#

Conclusion#

Appendices#

API Reference#